Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech

نویسندگان

  • Roberto Seara
  • Marta Martinez
  • Rocío Varela
  • Carmen García-Mateo
  • Elisa Fernández Rei
  • Xose Luis Regueira
چکیده

The Corpus Oral Informatizado da Lingua Galega (CORILGA) project aims at building a corpus of oral language for Galician, primarily designed to study the linguistic variation and change. This project is currently under development and it is periodically enriched with new contributions. The long-term goal is that all the speech recordings will be enriched with phonetic, syllabic, morphosyntactic, lexical and sentence ELAN-complaint annotations. A way to speed up the process of annotation is to use automatic speech-recognition-based tools tailored to the application. Therefore, CORILGA repository has been enhanced with an automatic alignment tool, available to the administrator of the repository, that aligns speech with an orthographic transcription. In the event that no transcription, or just a partial one, were available, a speech recognizer for Galician is used to generate word and phonetic segmentations. These recognized outputs may contain errors that will have to be manually corrected by the administrator. For assisting this task, the tool also provides an ELAN tier with the confidence measure of each recognized word. In this paper, after the description of the main facts of the CORILGA corpus, the speech alignment and recognition tools are described. Both have been developed using the Kaldi toolkit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EasyAlign: An Automatic Phonetic Alignment Tool Under Praat

We provide a user-friendly automatic phonetic alignment tool for continuous speech, named EasyAlign. It is developed as a plug-in of Praat, the popular speech analysis software, and it is freely available. Its main advantage is that one can easily align speech from an orthographic transcription. It requires a few minor manual steps and the result is a multi-level annotation within a TextGrid co...

متن کامل

EasyAlign: a friendly automatic phonetic alignment tool under Praat

We propose a user-friendly automatic phonetic alignment tool for continuous speech: EasyAlign. It is developed and freely distributed as a plug-in of Praat, the popular speech analysis software. Its main advantage is that one can easily align speech from an orthographic transcription. It requires a few minor manual steps and the result is a multi-level annotation within a TextGrid composed of p...

متن کامل

CoALT: A Software for Comparing Automatic Labelling Tools

Speech-text alignment tools are frequently used in speech technology and research. In this paper, we propose a GPL software CoALT (Comparing Automatic Labelling Tools) for comparing two automatic labellers or two speech-text alignment tools, ranking them and displaying statistics about their differences. The main feature of CoALT is that a user can define its own criteria for evaluating and com...

متن کامل

A Comparison of Different Approaches to Automatic Speech Segmentation

We compare different methods for obtaining accurate speech segmentations starting from the corresponding orthography. The complete segmentation process can be decomposed into two basic steps. First, a phonetic transcription is automatically produced with the help of large vocabulary continuous speech recognition (LVCSR). Then, the phonetic information and the speech signal serve as input to a s...

متن کامل

Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech

We propose in this work a novel acoustic phonetic study for Arabic people suffering from language disabilities and non-native learners of Arabic language to classify Arabic continuous speech to pathological or healthy and to identify phonemes that pose pronunciation problems (case of pathological speeches). The main idea can be summarized in comparing between the phonetic model reference to Ara...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016